Robust Significance Testing in Sparse and High Dimensional Linear Models

نویسندگان

  • Wenjing Yin
  • Jelena Bradic
چکیده

Classical statistical theory offers validity under restricted assumptions. However, in practice, it is a common approach to perform statistical analysis based on data-driven model selection [1], which guarantees none of results of classical statistical theory. Those results include hypothesis testings and confidence intervals which are useful tools of measuring fitness of models. Considering that too much information about the true model of the datasets is unknown,we are unable to perform any testing before model selection. However, we are still interested in identifying how well the model we select fits the data, which leads to the problems of testing after model selection. In this paper, we discuss the robustness in testing after model selection of the lasso. Lasso, as a relatively new estimation procedure, have not been thoroughly explored yet. Especially when working in practice, one may intend to assure that the lasso model he or she chooses is the appropriate one within the assigned significance level.In the last decades, a few papers have been working on the testing problems of the lasso. Among those papers, we choose [2] as a reference paper and prove some of the lemmas of [2] with details. The lemmas help us to understand the properties of the test statistic derived in the paper, named covariance test statistic, and its asymptotic distribution under the null hypothesis. We also determine the exact stoping time for selecting the variables during the second step of the LARS algorithm. The exact stopping time allows us to propose a new LARS algorithm that is robust to the presence of outliers, by using Kendall’s τ correlation coefficient. We mimic the successive feature of the famous LARS algorithm and use the exact stopping time to select the second explanatory variables. Additionally, we propose a new test statistic that tests whether the selected variables are contained in the support of true model. The test statistics compares the covariance between the model selected before the stopping time and the model that includes an additional feature, in terms of Kendall’s τ correlation coefficient. Furthermore, we find a connection of our new test statistic with the Wilcoxon ranked-sum test and use that connection to study its distribution properties. The analysis is complicated by the intricate dependencies present in the proposed test statistic. We conjecture that the new test statistic has asymptotically rescaled normal distribution under the null. We also design a simulation to illustrate the finite sample properties of our new test statistic. We observe that the finite sample behavior shows better stability in comparison to the existing covariance test statistic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Fixed effects testing in high-dimensional linear mixed models

Many scientific and engineering challenges – ranging from pharmacokinetic drug dosage allocation and personalized medicine to marketing mix (4Ps) recommendations – require an understanding of the unobserved heterogeneity in order to develop the best decision making-processes. In this paper, we develop a hypothesis test and the corresponding p-value for testing for the significance of the homoge...

متن کامل

Hyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations

The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...

متن کامل

Computationally Efficient Robust Estimation of Sparse Functionals

Many conventional statistical procedures are extremely sensitive to seemingly minor deviations from modeling assumptions. This problem is exacerbated in modern high-dimensional settings, where the problem dimension can grow with and possibly exceed the sample size. We consider the problem of robust estimation of sparse functionals, and provide a computationally and statistically efficient algor...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015